A Hybrid Gaussian-HMM-Deep Learning Approach for Automatic Chord Estimation with Very Large Vocabulary
نویسندگان
چکیده
We propose a hybrid Gaussian-HMM-Deep-Learning approach for automatic chord estimation with very large chord vocabulary. The Gaussian-HMM part is similar to Chordino, which is used as a segmentation engine to divide input audio into note spectrogram segments. Two types of deep learning models are proposed to classify these segments into chord labels, which are then connected as chord sequences. Two sets of evaluations are conducted with two large chord vocabularies. The first evaluation is conducted in a recent MIREX standard way. Results show that our approach has obvious advantage over the state-of-the-art large-vocabulary-with-inversions supportable ACE system in terms of large vocabularies, although is outperformed by in small vocabularies. Through analyzing and deducing system behaviors behind the results, we see interesting chord confusion patterns made by different systems, which conceivably point to a demand of more balanced and consistent annotated datasets for training and testing. The second evaluation preliminarily demonstrates our approach’s superiority on a jazz chord vocabulary with 36 chord types, compared with a Chordino-like Gaussian-HMM baseline system with augmented vocabulary capacity.
منابع مشابه
Large Vocabulary Automatic Chord Estimation Using Deep Neural Nets: Design Framework, System Variations and Limitations
In this paper, we propose a new system design framework for large vocabulary automatic chord estimation. Our approach is based on an integration of traditional sequence segmentation processes and deep learning chord classification techniques. We systematically explore the design space of the proposed framework for a range of parameters, namely deep neural nets, network configurations, input fea...
متن کاملChord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations
The increasing accuracy of automatic chord estimation systems, the availability of vast amounts of heterogeneous reference annotations, and insights from annotator subjectivity research make chord label personalization increasingly important. Nevertheless, automatic chord estimation systems are historically exclusively trained and evaluated on a single reference annotation. We introduce a first...
متن کاملA Hybrid HMM/SVM Classifier for Wavelet Front End Robust Automatic Speech Recognition
Noisy ambient conditions pose a challenge to speech recognition, increasing the acoustic confusability, thereby looking for powerful acoustic models to improve the generalization ability of the machine learning and improve the recognition accuracy. This paper discusses a hybrid classifier that harness the power of hidden markov models (HMM) and the discriminative support vector machines (SVM) a...
متن کاملApplication of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition
The use of Deep Belief Networks (DBN) to pretrain Neural Networks has recently led to a resurgence in the use of Artificial Neural Network Hidden Markov Model (ANN/HMM) hybrid systems for Automatic Speech Recognition (ASR). In this paper we report results of a DBN-pretrained context-dependent ANN/HMM system trained on two datasets that are much larger than any reported previously with DBN-pretr...
متن کاملLarge vocabulary children's speech recognition with DNN-HMM and SGMM acoustic modeling
In this paper, large vocabulary children’s speech recognition is investigated by using the Deep Neural Network Hidden Markov Model (DNN-HMM) hybrid and the Subspace Gaussian Mixture Model (SGMM) acoustic modeling approach. In the investigated scenario training data is limited to about 7 hours of speech from children in the age range 7-13 and testing data consists in read clean speech from child...
متن کامل